Evaluation of probabilistic queries over imprecise data in constantly-evolving environments
نویسندگان
چکیده
Sensors are often employed to monitor continuously changing entities like locations of moving objects and temperature. The sensor readings are reported to a database system, and are subsequently used to answer queries. Due to continuous changes in these values and limited resources (e.g., network bandwidth and battery power), the database may not be able to keep track of the actual values of the entities. Queries that use these old values may produce incorrect answers. However, if the degree of uncertainty between the actual data value and the database value is limited, one can place more confidence in the answers to the queries. More generally, query answers can be augmented with probabilistic guarantees of the validity of the answers. In this paper, we study probabilistic query evaluation based on uncertain data. A classification of queries is made based upon the nature of the result set. For each class, we develop algorithms for computing probabilistic answers, and provide efficient indexing and numeric solutions. We address the important issue of measuring the quality of the answers to these queries, and provide algorithms for efficiently pulling data from relevant sensors or moving objects in order to improve the quality of the executing queries. Extensive experiments ∗A shorter version of this paper appeared in SIGMOD 2003 (http://www.cs.purdue.edu/homes/ckcheng/papers/sigmod03.pdf). This manuscript contains, among others, the following new materials: (1) A new section (Section 5) on efficient evaluation of probabilistic queries, where disk-based uncertainty indexing and numerical methods are examined; (2) New sets of experimental results (Section 7) in a more realistic simulation model; (3) A method based on time-series analysis for obtaining a probability density function in the uncertainty model (Appendix A); (4) Enhancement of probability query evaluation algorithms to handle special cases of uncertainty in Appendix B; and (3) Discussions on future work in Section 9, as well as more detailed examples. †Corresponding author. 1 are performed to examine the effectiveness of several data update policies.
منابع مشابه
U-DBMS: A Database System for Managing Constantly-Evolving Data
In many systems, sensors are used to acquire information from external environments such as temperature, pressure and locations. Due to continuous changes in these values, and limited resources (e.g., network bandwidth and battery power), it is often infeasible for the database to store the exact values at all times. Queries that uses these old values can produce invalid results. In order to ma...
متن کاملEfficient Evaluation of HAVING Queries on a Probabilistic Database
We study the evaluation of positive conjunctive queries with Boolean aggregate tests (similar to HAVING queries in SQL) on probabilistic databases. Our motivation is to handle aggregate queries over imprecise data resulting from information integration or information extraction. More precisely, we study conjunctive queries with predicate aggregates using MIN, MAX, COUNT, SUM, AVG or COUNT(DISTI...
متن کاملEvaluating Continuous Probabilistic Queries Over Imprecise Sensor Data
Pervasive applications, such as natural habitat monitoring and locationbased services, have attracted plenty of research interest. These applications deploy a large number of sensors (e.g. temperature sensors) and positioning devices (e.g. GPS) to collect data from external environments. Very often, these systems have limited network bandwidth and battery resources. The sensors also cannot reco...
متن کاملContinuous Probabilistic Count Queries in Wireless Sensor Networks
Count queries in wireless sensor networks report the number of sensor nodes for which the measured values satisfy a given query predicate. However, measurements in wireless sensor networks are typically imprecise due to limited accuracy of the sensor hardware or fluctuations in the observed environment. Consequently, queries performed on these imprecise information implicate imprecise answers. ...
متن کاملScalable Statistical Modeling and Query Processing over Large Scale Uncertain Databases
Title of Dissertation: SCALABLE STATISTICAL MODELING AND QUERY PROCESSING OVER LARGE SCALE UNCERTAIN DATABASES Bhargav Kanagal Shamanna Doctor of Philosophy, 2011 Dissertation directed by: Dr. Amol Deshpande Dept. of Computer Science The past decade has witnessed a large number of novel applications that generate imprecise, uncertain and incomplete data. Examples include monitoring infrastructu...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Inf. Syst.
دوره 32 شماره
صفحات -
تاریخ انتشار 2007